An Alternative Method to Remove Duplicate Tuples

نویسنده

NICHOLAS T. KARONIS

چکیده

The problem of performing database operations on parallel architectures has received much attention, both as applied and theoretical areas of research. Much of the attention has been focused on performing these operations on distributed-memory architectures, for example, a hyper-cube. Algorithms that perform, in particular, relational database operations on a hypercube typically exploit the hypercube's unique interconnectivity to not only process the relational operators eeciently but also perform dynamic load balancing. Certain relational operators (e.g., projection and union) can produce interim relations that contain duplicate tuples. As a result, an algorithm for a relational database system must address the issue of removing duplicate tuples from these interim relations. The algorithms accomplish this by compacting the relation into hypercubes of smaller and smaller dimensions. We present an alternative method for removing duplicate tuples from a relation that is distributed over a hypercube by using the embedded ring found in every hypercube. Through theoretical analysis of the algorithm and empirical observation, we demonstrate that using the ring to remove the duplicate tuples is signiicantly more eecient than using the hypercube.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semantic Management of Deduplicate Tuples in the Relational Databases

Relational database is a collection of relations. Duplicate tuple existence is common in many real time relational databases. In a relational database, if the same real-world entity is represented by more than one tuple, then such tuples are called duplicate tuples. Finding duplicate tuples and then replacing them by one best tuple is called a fusion operation. Whenever duplicate tuples are fou...

متن کامل

Approximate Joins for Relational Data

Krommydas, Ioannis, Evagelos, Georgia. MSc, Computer Science Department, University of Ioannina, Greece. June, 2008. Approximate Joins for Relational Data. Thesis Supervisor: Vassiliadis Panos. Relational databases often contain duplicate data entries. This may occur due to a variety of reasons, such as typographical errors, multiple conventions for recording database fields or other noise sour...

متن کامل

An Alternative Secondary Goal Approach to Modify Cross Efficiency Evaluation in Data Envelopment Analysis

The cross efficiency evaluation is used to performance measurement of decision making units in data envelopment analysis concept. One of the most important shortcoming of this method is existing alternative optimal solution and therefore, the efficiency scores are not unique. We are going to summarize the pervious models proposed by researchers and suggest an alternative secondary goal approach...

متن کامل

Eliminating Fuzzy Duplicates in Data Warehouses

1 Work done while visiting Microsoft Research Abstract The duplicate elimination problem of detecting multiple tuples, which describe the same real world entity, is an important data cleaning problem. Previous domain independent solutions to this problem relied on standard textual similarity functions (e.g., edit distance, cosine metric) between multi-attribute tuples. However, such approaches ...

متن کامل

Indeterministic Handling of Uncertain Decisions in Duplicate Detection

In current research, duplicate detection is usually considered as a deterministic approach in which tuples are either declared as duplicates or not. However, most often it is not completely clear whether two tuples represent the same real-world entity or not. In deterministic approaches, however, this uncertainty is ignored, which in turn can lead to false decisions. In this paper, we present a...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

An Alternative Method to Remove Duplicate Tuples

نویسنده

چکیده

منابع مشابه

Semantic Management of Deduplicate Tuples in the Relational Databases

Approximate Joins for Relational Data

An Alternative Secondary Goal Approach to Modify Cross Efficiency Evaluation in Data Envelopment Analysis

Eliminating Fuzzy Duplicates in Data Warehouses

Indeterministic Handling of Uncertain Decisions in Duplicate Detection

عنوان ژورنال:

اشتراک گذاری